Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 110148 |
| Missing cells | 36827 |
| Missing cells (%) | 1.7% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 51.0 MiB |
| Average record size in memory | 485.9 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 7 |
| BOOL | 5 |
Reproduction
| Analysis started | 2020-07-30 06:34:38.459667 |
|---|---|
| Analysis finished | 2020-07-30 06:38:15.758096 |
| Version | pandas-profiling v2.5.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
app_date has a high cardinality: 120 distinct values | High cardinality |
default has 36349 (33.0%) missing values | Missing |
decline_app_cnt has 91471 (83.0%) zeros | Zeros |
bki_request_cnt has 28908 (26.2%) zeros | Zeros |
df_index
Real number (ℝ≥0)
| Distinct count | 73799 |
|---|---|
| Unique (%) | 67.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30719.7228 |
|---|---|
| Minimum | 0 |
| Maximum | 73798 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2753.35 |
| Q1 | 13768 |
| median | 27536.5 |
| Q3 | 46261.25 |
| 95-th percentile | 68290.65 |
| Maximum | 73798 |
| Range | 73798 |
| Interquartile range (IQR) | 32493.25 |
Descriptive statistics
| Standard deviation | 20443.7233 |
|---|---|
| Coefficient of variation (CV) | 0.6654917896 |
| Kurtosis | -0.8889186034 |
| Mean | 30719.7228 |
| Median Absolute Deviation (MAD) | 17135.70993 |
| Skewness | 0.4419381927 |
| Sum | 3383716027 |
| Variance | 417945822.4 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 36348.5 73798. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 2047 | 2 | < 0.1% | |
| 17556 | 2 | < 0.1% | |
| 3229 | 2 | < 0.1% | |
| 1180 | 2 | < 0.1% | |
| 15515 | 2 | < 0.1% | |
| 13466 | 2 | < 0.1% | |
| 11417 | 2 | < 0.1% | |
| 9368 | 2 | < 0.1% | |
| 23703 | 2 | < 0.1% | |
| 21654 | 2 | < 0.1% | |
| Other values (73789) | 110128 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 73798 | 1 | < 0.1% | |
| 73797 | 1 | < 0.1% | |
| 73796 | 1 | < 0.1% | |
| 73795 | 1 | < 0.1% | |
| 73794 | 1 | < 0.1% |
| Distinct count | 110148 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 55074.5 |
|---|---|
| Minimum | 1 |
| Maximum | 110148 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5508.35 |
| Q1 | 27537.75 |
| median | 55074.5 |
| Q3 | 82611.25 |
| 95-th percentile | 104640.65 |
| Maximum | 110148 |
| Range | 110147 |
| Interquartile range (IQR) | 55073.5 |
Descriptive statistics
| Standard deviation | 31797.13306 |
|---|---|
| Coefficient of variation (CV) | 0.5773476484 |
| Kurtosis | -1.2 |
| Mean | 55074.5 |
| Median Absolute Deviation (MAD) | 27537 |
| Skewness | 0 |
| Sum | 6066346026 |
| Variance | 1011057671 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e+00 1.10148e+05], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 97541 | 1 | < 0.1% | |
| 93447 | 1 | < 0.1% | |
| 70920 | 1 | < 0.1% | |
| 72969 | 1 | < 0.1% | |
| 66826 | 1 | < 0.1% | |
| 68875 | 1 | < 0.1% | |
| 79116 | 1 | < 0.1% | |
| 81165 | 1 | < 0.1% | |
| 75022 | 1 | < 0.1% | |
| Other values (110138) | 110138 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 110148 | 1 | < 0.1% | |
| 110147 | 1 | < 0.1% | |
| 110146 | 1 | < 0.1% | |
| 110145 | 1 | < 0.1% | |
| 110144 | 1 | < 0.1% |
| Distinct count | 120 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 18MAR2014 | 1491 |
|---|---|
| 19MAR2014 | 1363 |
| 17MAR2014 | 1350 |
| 31MAR2014 | 1317 |
| 07APR2014 | 1296 |
| Other values (115) |
| Value | Count | Frequency (%) | |
| 18MAR2014 | 1491 | 1.4% | |
| 19MAR2014 | 1363 | 1.2% | |
| 17MAR2014 | 1350 | 1.2% | |
| 31MAR2014 | 1317 | 1.2% | |
| 07APR2014 | 1296 | 1.2% | |
| 02APR2014 | 1291 | 1.2% | |
| 11MAR2014 | 1245 | 1.1% | |
| 04MAR2014 | 1242 | 1.1% | |
| 01APR2014 | 1239 | 1.1% | |
| 11FEB2014 | 1233 | 1.1% | |
| Other values (110) | 97081 | 88.1% |
Length
| Max length | 9 |
|---|---|
| Mean length | 9 |
| Min length | 9 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 52.6% | |
| Uppercase_Letter | 9 | 47.4% |
| Value | Count | Frequency (%) | |
| Common | 10 | 52.6% | |
| Latin | 9 | 47.4% |
| Value | Count | Frequency (%) | |
| ASCII | 19 | 100.0% |
education
Categorical
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 478 |
| Missing (%) | 0.4% |
| Memory size | 860.7 KiB |
| SCH | |
|---|---|
| GRD | |
| UGR | |
| PGR | 1865 |
| ACD | 291 |
| Value | Count | Frequency (%) | |
| SCH | 57998 | 52.7% | |
| GRD | 34768 | 31.6% | |
| UGR | 14748 | 13.4% | |
| PGR | 1865 | 1.7% | |
| ACD | 291 | 0.3% | |
| (Missing) | 478 | 0.4% |
Length
| Max length | 3 |
|---|---|
| Mean length | 3 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Uppercase_Letter | 9 | 81.8% | |
| Lowercase_Letter | 2 | 18.2% |
| Value | Count | Frequency (%) | |
| Latin | 11 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 11 | 100.0% |
sex
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| F | |
|---|---|
| M |
| Value | Count | Frequency (%) | |
| F | 61836 | 56.1% | |
| M | 48312 | 43.9% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Uppercase_Letter | 2 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 2 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 2 | 100.0% |
age
Real number (ℝ≥0)
| Distinct count | 52 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.24940988 |
|---|---|
| Minimum | 21 |
| Maximum | 72 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 30 |
| median | 37 |
| Q3 | 48 |
| 95-th percentile | 60 |
| Maximum | 72 |
| Range | 51 |
| Interquartile range (IQR) | 18 |
Descriptive statistics
| Standard deviation | 11.51806263 |
|---|---|
| Coefficient of variation (CV) | 0.2934582371 |
| Kurtosis | -0.7260121183 |
| Mean | 39.24940988 |
| Median Absolute Deviation (MAD) | 9.695175763 |
| Skewness | 0.4802480831 |
| Sum | 4323244 |
| Variance | 132.6657668 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 23.5 24.5 ... 67.5 68.5 69.5 70.5 72. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 31 | 4084 | 3.7% | |
| 28 | 4035 | 3.7% | |
| 30 | 4035 | 3.7% | |
| 27 | 3964 | 3.6% | |
| 29 | 3940 | 3.6% | |
| 26 | 3780 | 3.4% | |
| 32 | 3773 | 3.4% | |
| 34 | 3548 | 3.2% | |
| 33 | 3499 | 3.2% | |
| 35 | 3386 | 3.1% | |
| Other values (42) | 72104 | 65.5% |
| Value | Count | Frequency (%) | |
| 21 | 1262 | 1.1% | |
| 22 | 1415 | 1.3% | |
| 23 | 2295 | 2.1% | |
| 24 | 2780 | 2.5% | |
| 25 | 3292 | 3.0% |
| Value | Count | Frequency (%) | |
| 72 | 2 | < 0.1% | |
| 71 | 6 | < 0.1% | |
| 70 | 60 | 0.1% | |
| 69 | 110 | 0.1% | |
| 68 | 261 | 0.2% |
car
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| N | |
|---|---|
| Y |
| Value | Count | Frequency (%) | |
| N | 74290 | 67.4% | |
| Y | 35858 | 32.6% |
car_type
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| N | |
|---|---|
| Y |
| Value | Count | Frequency (%) | |
| N | 89140 | 80.9% | |
| Y | 21008 | 19.1% |
| Distinct count | 24 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2732051422 |
|---|---|
| Minimum | 0 |
| Maximum | 33 |
| Zeros | 91471 |
| Zeros (%) | 83.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 33 |
| Range | 33 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.799099319 |
|---|---|
| Coefficient of variation (CV) | 2.924905851 |
| Kurtosis | 101.2380998 |
| Mean | 0.2732051422 |
| Median Absolute Deviation (MAD) | 0.4537594429 |
| Skewness | 6.493006696 |
| Sum | 30093 |
| Variance | 0.6385597216 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 7.5 9.5 11.5 14.5 33. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 0 | 91471 | 83.0% | |
| 1 | 12500 | 11.3% | |
| 2 | 3622 | 3.3% | |
| 3 | 1365 | 1.2% | |
| 4 | 606 | 0.6% | |
| 5 | 255 | 0.2% | |
| 6 | 156 | 0.1% | |
| 7 | 58 | 0.1% | |
| 8 | 37 | < 0.1% | |
| 9 | 29 | < 0.1% | |
| Other values (14) | 49 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 91471 | 83.0% | |
| 1 | 12500 | 11.3% | |
| 2 | 3622 | 3.3% | |
| 3 | 1365 | 1.2% | |
| 4 | 606 | 0.6% |
| Value | Count | Frequency (%) | |
| 33 | 1 | < 0.1% | |
| 30 | 1 | < 0.1% | |
| 24 | 1 | < 0.1% | |
| 22 | 1 | < 0.1% | |
| 21 | 1 | < 0.1% |
good_work
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 0 | |
|---|---|
| 1 | 18231 |
| Value | Count | Frequency (%) | |
| 0 | 91917 | 83.4% | |
| 1 | 18231 | 16.6% |
score_bki
Real number (ℝ)
| Distinct count | 102618 |
|---|---|
| Unique (%) | 93.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -1.904535049 |
|---|---|
| Minimum | -3.62458632 |
| Maximum | 0.19977285 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | -3.62458632 |
|---|---|
| 5-th percentile | -2.696247185 |
| Q1 | -2.26043367 |
| median | -1.92082293 |
| Q3 | -1.567888152 |
| 95-th percentile | -1.055049083 |
| Maximum | 0.19977285 |
| Range | 3.82435917 |
| Interquartile range (IQR) | 0.6925455175 |
Descriptive statistics
| Standard deviation | 0.4993974924 |
|---|---|
| Coefficient of variation (CV) | -0.2622149131 |
| Kurtosis | -0.1492918934 |
| Mean | -1.904535049 |
| Median Absolute Deviation (MAD) | 0.4026105377 |
| Skewness | 0.1939872976 |
| Sum | -209780.7266 |
| Variance | 0.2493978554 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-3.62458632 -3.40235635 -3.21728287 -3.21663488 -3.15990211 ... -0.6388378 -0.52401387 -0.37380806 -0.15264482 0.19977285], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| -1.77526279 | 517 | 0.5% | |
| -2.1042109 | 454 | 0.4% | |
| -2.22500363 | 424 | 0.4% | |
| -2.16966378 | 375 | 0.3% | |
| -2.02410005 | 278 | 0.3% | |
| -1.92082293 | 270 | 0.2% | |
| -2.38726804 | 238 | 0.2% | |
| -1.52642194 | 207 | 0.2% | |
| -2.44723899 | 207 | 0.2% | |
| -2.2729409 | 176 | 0.2% | |
| Other values (102608) | 107002 | 97.1% |
| Value | Count | Frequency (%) | |
| -3.62458632 | 1 | < 0.1% | |
| -3.59798083 | 1 | < 0.1% | |
| -3.58258691 | 1 | < 0.1% | |
| -3.57419708 | 1 | < 0.1% | |
| -3.56422406 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0.19977285 | 2 | < 0.1% | |
| 0.1980699 | 1 | < 0.1% | |
| 0.18882044 | 1 | < 0.1% | |
| 0.18361297 | 1 | < 0.1% | |
| 0.16854933 | 1 | < 0.1% |
| Distinct count | 40 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.00500236 |
|---|---|
| Minimum | 0 |
| Maximum | 53 |
| Zeros | 28908 |
| Zeros (%) | 26.2% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 3 |
| 95-th percentile | 6 |
| Maximum | 53 |
| Range | 53 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.266925867 |
|---|---|
| Coefficient of variation (CV) | 1.130635012 |
| Kurtosis | 23.16785082 |
| Mean | 2.00500236 |
| Median Absolute Deviation (MAD) | 1.552358663 |
| Skewness | 3.082728152 |
| Sum | 220847 |
| Variance | 5.138952887 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 16.5 19.5 24.5 33.5 53. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 0 | 28908 | 26.2% | |
| 1 | 27295 | 24.8% | |
| 2 | 20481 | 18.6% | |
| 3 | 13670 | 12.4% | |
| 4 | 8406 | 7.6% | |
| 5 | 4960 | 4.5% | |
| 6 | 2500 | 2.3% | |
| 7 | 1292 | 1.2% | |
| 8 | 735 | 0.7% | |
| 9 | 459 | 0.4% | |
| Other values (30) | 1442 | 1.3% |
| Value | Count | Frequency (%) | |
| 0 | 28908 | 26.2% | |
| 1 | 27295 | 24.8% | |
| 2 | 20481 | 18.6% | |
| 3 | 13670 | 12.4% | |
| 4 | 8406 | 7.6% |
| Value | Count | Frequency (%) | |
| 53 | 1 | < 0.1% | |
| 47 | 1 | < 0.1% | |
| 46 | 1 | < 0.1% | |
| 45 | 1 | < 0.1% | |
| 41 | 1 | < 0.1% |
region_rating
Real number (ℝ≥0)
| Distinct count | 7 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 56.75118931 |
|---|---|
| Minimum | 20 |
| Maximum | 80 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 40 |
| Q1 | 50 |
| median | 50 |
| Q3 | 60 |
| 95-th percentile | 80 |
| Maximum | 80 |
| Range | 60 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 13.06592289 |
|---|---|
| Coefficient of variation (CV) | 0.2302317017 |
| Kurtosis | -0.6334345368 |
| Mean | 56.75118931 |
| Median Absolute Deviation (MAD) | 10.90200861 |
| Skewness | 0.4778692262 |
| Sum | 6251030 |
| Variance | 170.7183409 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[20. 35. 45. 55. 65. 75. 80.], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 50 | 40981 | 37.2% | |
| 60 | 23999 | 21.8% | |
| 40 | 17947 | 16.3% | |
| 80 | 17170 | 15.6% | |
| 70 | 9304 | 8.4% | |
| 30 | 434 | 0.4% | |
| 20 | 313 | 0.3% |
| Value | Count | Frequency (%) | |
| 20 | 313 | 0.3% | |
| 30 | 434 | 0.4% | |
| 40 | 17947 | 16.3% | |
| 50 | 40981 | 37.2% | |
| 60 | 23999 | 21.8% |
| Value | Count | Frequency (%) | |
| 80 | 17170 | 15.6% | |
| 70 | 9304 | 8.4% | |
| 60 | 23999 | 21.8% | |
| 50 | 40981 | 37.2% | |
| 40 | 17947 | 16.3% |
home_address
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 2 | |
|---|---|
| 1 | |
| 3 | 1869 |
| Value | Count | Frequency (%) | |
| 2 | 59591 | 54.1% | |
| 1 | 48688 | 44.2% | |
| 3 | 1869 | 1.7% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 3 | 100.0% |
work_address
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 3 | |
|---|---|
| 2 | |
| 1 | 12274 |
| Value | Count | Frequency (%) | |
| 3 | 67113 | 60.9% | |
| 2 | 30761 | 27.9% | |
| 1 | 12274 | 11.1% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 3 | 100.0% |
income
Real number (ℝ≥0)
| Distinct count | 1207 |
|---|---|
| Unique (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41012.64854 |
|---|---|
| Minimum | 1000 |
| Maximum | 1000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 860.7 KiB |
Quantile statistics
| Minimum | 1000 |
|---|---|
| 5-th percentile | 10000 |
| Q1 | 20000 |
| median | 30000 |
| Q3 | 48000 |
| 95-th percentile | 100000 |
| Maximum | 1000000 |
| Range | 999000 |
| Interquartile range (IQR) | 28000 |
Descriptive statistics
| Standard deviation | 45399.73505 |
|---|---|
| Coefficient of variation (CV) | 1.10696911 |
| Kurtosis | 100.1746159 |
| Mean | 41012.64854 |
| Median Absolute Deviation (MAD) | 23532.68019 |
| Skewness | 7.503020095 |
| Sum | 4517461211 |
| Variance | 2061135943 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1000. 1050. 4990. 5050. 5150. ... 590000. 615000. 999499.5 999999.5 1000000. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 30000 | 10437 | 9.5% | |
| 25000 | 9090 | 8.3% | |
| 20000 | 8174 | 7.4% | |
| 40000 | 7383 | 6.7% | |
| 50000 | 6742 | 6.1% | |
| 35000 | 6319 | 5.7% | |
| 15000 | 5874 | 5.3% | |
| 60000 | 3818 | 3.5% | |
| 45000 | 3670 | 3.3% | |
| 18000 | 2732 | 2.5% | |
| Other values (1197) | 45909 | 41.7% |
| Value | Count | Frequency (%) | |
| 1000 | 6 | < 0.1% | |
| 1100 | 1 | < 0.1% | |
| 1200 | 1 | < 0.1% | |
| 1500 | 2 | < 0.1% | |
| 1700 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1000000 | 13 | < 0.1% | |
| 999999 | 4 | < 0.1% | |
| 999000 | 2 | < 0.1% | |
| 990000 | 1 | < 0.1% | |
| 950000 | 4 | < 0.1% |
sna
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 1 | |
|---|---|
| 4 | |
| 2 | |
| 3 | 6154 |
| Value | Count | Frequency (%) | |
| 1 | 70681 | 64.2% | |
| 4 | 17481 | 15.9% | |
| 2 | 15832 | 14.4% | |
| 3 | 6154 | 5.6% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 4 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 4 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 4 | 100.0% |
first_time
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| 3 | |
|---|---|
| 4 | |
| 1 | |
| 2 |
| Value | Count | Frequency (%) | |
| 3 | 46588 | 42.3% | |
| 4 | 28017 | 25.4% | |
| 1 | 18296 | 16.6% | |
| 2 | 17247 | 15.7% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 4 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 4 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 4 | 100.0% |
foreign_passport
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 860.7 KiB |
| N | |
|---|---|
| Y | 16427 |
| Value | Count | Frequency (%) | |
| N | 93721 | 85.1% | |
| Y | 16427 | 14.9% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| df_index | client_id | app_date | education | sex | age | car | car_type | decline_app_cnt | good_work | score_bki | bki_request_cnt | region_rating | home_address | work_address | income | sna | first_time | foreign_passport | default | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 25905 | 01FEB2014 | SCH | M | 62 | Y | Y | 0 | 0 | -2.008753 | 1 | 50 | 1 | 2 | 18000 | 4 | 1 | N | 0.0 |
| 1 | 1 | 63161 | 12MAR2014 | SCH | F | 59 | N | N | 0 | 0 | -1.532276 | 3 | 50 | 2 | 3 | 19000 | 4 | 1 | N | 0.0 |
| 2 | 2 | 25887 | 01FEB2014 | SCH | M | 25 | Y | N | 2 | 0 | -1.408142 | 1 | 80 | 1 | 2 | 30000 | 1 | 4 | Y | 0.0 |
| 3 | 3 | 16222 | 23JAN2014 | SCH | F | 53 | N | N | 0 | 0 | -2.057471 | 2 | 50 | 2 | 3 | 10000 | 1 | 3 | N | 0.0 |
| 4 | 4 | 101655 | 18APR2014 | GRD | M | 48 | N | N | 0 | 1 | -1.244723 | 1 | 60 | 2 | 3 | 30000 | 1 | 4 | Y | 0.0 |
| 5 | 5 | 41415 | 18FEB2014 | SCH | M | 27 | Y | N | 0 | 1 | -2.032257 | 0 | 50 | 1 | 1 | 15000 | 2 | 3 | N | 0.0 |
| 6 | 6 | 28436 | 04FEB2014 | SCH | M | 39 | N | N | 0 | 0 | -2.225004 | 0 | 60 | 1 | 2 | 28000 | 1 | 1 | N | 0.0 |
| 7 | 7 | 68769 | 17MAR2014 | SCH | F | 39 | N | N | 0 | 0 | -1.522739 | 1 | 50 | 2 | 3 | 45000 | 3 | 3 | N | 0.0 |
| 8 | 8 | 38424 | 14FEB2014 | SCH | F | 50 | Y | N | 1 | 0 | -1.676061 | 0 | 50 | 1 | 1 | 30000 | 1 | 4 | N | 0.0 |
| 9 | 9 | 4496 | 10JAN2014 | UGR | F | 54 | N | N | 0 | 0 | -2.695176 | 1 | 50 | 2 | 3 | 24000 | 1 | 3 | N | 0.0 |
Last rows
| df_index | client_id | app_date | education | sex | age | car | car_type | decline_app_cnt | good_work | score_bki | bki_request_cnt | region_rating | home_address | work_address | income | sna | first_time | foreign_passport | default | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 110138 | 36339 | 16072 | 23JAN2014 | GRD | F | 28 | N | N | 0 | 0 | -1.651781 | 4 | 60 | 1 | 2 | 13000 | 1 | 3 | N | NaN |
| 110139 | 36340 | 10090 | 17JAN2014 | SCH | F | 53 | Y | N | 0 | 0 | -1.845058 | 2 | 50 | 1 | 2 | 7000 | 1 | 1 | N | NaN |
| 110140 | 36341 | 90435 | 07APR2014 | UGR | F | 48 | N | N | 0 | 0 | -2.066300 | 1 | 60 | 1 | 1 | 27000 | 1 | 4 | N | NaN |
| 110141 | 36342 | 42509 | 19FEB2014 | SCH | F | 58 | Y | Y | 0 | 1 | -1.857117 | 1 | 50 | 2 | 3 | 25000 | 4 | 3 | N | NaN |
| 110142 | 36343 | 72405 | 20MAR2014 | SCH | F | 40 | N | N | 0 | 0 | -2.039905 | 0 | 50 | 2 | 3 | 20000 | 4 | 1 | N | NaN |
| 110143 | 36344 | 83775 | 31MAR2014 | SCH | F | 37 | N | N | 1 | 0 | -1.744976 | 3 | 50 | 2 | 3 | 15000 | 4 | 1 | N | NaN |
| 110144 | 36345 | 106254 | 25APR2014 | GRD | F | 64 | Y | Y | 0 | 0 | -2.293781 | 3 | 60 | 1 | 2 | 200000 | 1 | 4 | N | NaN |
| 110145 | 36346 | 81852 | 30MAR2014 | GRD | M | 31 | N | N | 2 | 0 | -0.940752 | 1 | 50 | 1 | 2 | 60000 | 4 | 2 | N | NaN |
| 110146 | 36347 | 1971 | 07JAN2014 | UGR | F | 27 | N | N | 1 | 0 | -1.242392 | 2 | 80 | 2 | 3 | 30000 | 1 | 1 | N | NaN |
| 110147 | 36348 | 69044 | 17MAR2014 | SCH | M | 38 | N | N | 0 | 0 | -1.507549 | 2 | 50 | 1 | 2 | 15000 | 4 | 2 | N | NaN |